Page 127 - Data Science Algorithms in a Week

P. 127

Clustering into K Clusters

0), ((0.46875, 1.0), 0), ((0.375, 0.75), 0), ((0.0, 0.7), 0), ((0.625,
0.3), 1), ((0.9375, 0.5), 1)]
centroids = [(0.22395833333333334, 0.63), (0.79375, 0.188)]

The blue cluster contains scaled features (0.09375,0.2), (0.25,0.65), (0.15625,0.48), (0.46875,1),
(0.375,0.75), (0,0.7) or unscaled ones (23,50000), (28,95000), (25,78000), (35,130000),
(32,105000), (20,100000). The red cluster contains scaled features (0.53125,0.04), (0.875,0.1),
(1,0), (0.625,0.3), (0.9375,0.5) or unscaled ones (37,34000), (48,40000), (52,30000), (40,60000),
(50,80000).

So Peter belongs to the red cluster. What is the proportion of house owners in a red cluster
not counting Peter? 2/4 or 1/2 of the people in the red cluster are house owners. Thus the
red cluster to which Peter belongs does not seem to have a high predictive power in
determining whether Peter would be a house owner or not. We may try to cluster the data
into more clusters in the hope that we would gain a purer cluster that could be more
reliable for a prediction of the house-ownership for Peter. Let us therefore try to cluster the
data into the three clusters.

[ 115 ]

122 123 124 125 126 127 128 129 130 131 132