Page 144 - Data Science Algorithms in a Week
P. 144
Clustering into K Clusters
43.0)), (14, (36.0, 38.0)), (16, (36.0, 38.0)), (17, (36.0,
39.0)), (18, (37.0, 38.0))]
Cluster 1: [(3, (24.0, 28.0)), (5, (32.0, 34.0)), (6, (24.0,
27.0)), (7, (29.0, 32.0)), (8, (35.0, 35.0)), (9, (33.0,
36.0)), (11, (22.0, 27.0)), (15, (30.0, 32.0))]
th
We would like to determine the expected number of the children for the 15 couple (30,32),
i.e where a wife is 30 years old and the husband is 32 years old. (30,32) is in the cluster 1.
The couples in the cluster 1 are: (24.0, 28.0), (32.0, 34.0), (24.0, 27.0), (29.0, 32.0), (35.0, 35.0),
(33.0, 36.0), (22.0, 27.0), (30.0, 32.0). Out of these and the first 14 couples used for the data the
remaining couples are: (24.0, 28.0), (32.0, 34.0), (24.0, 27.0), (29.0, 32.0), (35.0, 35.0), (33.0,
36.0), (22.0, 27.0). The average number of the children for these couples is est15=8/7~1.14.
This is the estimated number of the children for the 15 couple based on the data from the
th
first 14 couples.
The estimated number of the children for the 16 couple is est16=23/7~3.29. The estimated
th
th
number of the children for the 17 couple is also est17=23/7~3.29 since both 16 and 17 th
th
couple belong to the same cluster.
Now we will calculate the error E2 (2 for 2 clusters) between the estimated number of the
th
children (e.g. denoted est15 for the 15 couple) and the actual number of the children
th
(example. denoted act15 for the 15 couple ) as follows:
E2=sqrt(sqr(est15-act15)+sqr(est16-act16)+sqr(est17-act17))
=sqrt(sqr(8/7-1)+sqr(23/7-0)+sqr(23/7-3))~3.3
Now that we have calculated the error E2, we will calculate the errors of the estimation with
the other number of clusters. We will choose the number of the clusters with the least error
th
to estimate the number of the children for the 18 couple.
Output for 3 clusters:
Cluster 0: [(1, (48.0, 49.0)), (2, (40.0, 43.0)), (4, (49.0, 42.0)), (10,
(42.0, 47.0)), (12, (41.0, 45.0)), (13, (39.0, 43.0))]
Cluster 1: [(3, (24.0, 28.0)), (6, (24.0, 27.0)), (7, (29.0, 32.0)), (11,
(22.0, 27.0)), (15, (30.0, 32.0))]
Cluster 2: [(5, (32.0, 34.0)), (8, (35.0, 35.0)), (9, (33.0, 36.0)), (14,
(36.0, 38.0)), (16, (36.0, 38.0)), (17, (36.0, 39.0)), (18, (37.0, 38.0))]
[ 132 ]