Page 144 - Data Science Algorithms in a Week
P. 144

Clustering into K Clusters


                             43.0)), (14, (36.0, 38.0)), (16, (36.0, 38.0)), (17, (36.0,
                             39.0)), (18, (37.0, 38.0))]
                             Cluster 1: [(3, (24.0, 28.0)), (5, (32.0, 34.0)), (6, (24.0,
                             27.0)), (7, (29.0, 32.0)), (8, (35.0, 35.0)), (9, (33.0,
                             36.0)), (11, (22.0, 27.0)), (15, (30.0, 32.0))]

                                                                                  th
            We would like to determine the expected number of the children for the 15  couple (30,32),
            i.e where a wife is 30 years old and the husband is 32 years old. (30,32) is in the cluster 1.
            The couples in the cluster 1 are: (24.0, 28.0), (32.0, 34.0), (24.0, 27.0), (29.0, 32.0), (35.0, 35.0),
            (33.0, 36.0), (22.0, 27.0), (30.0, 32.0). Out of these and the first 14 couples used for the data the
            remaining couples are: (24.0, 28.0), (32.0, 34.0), (24.0, 27.0), (29.0, 32.0), (35.0, 35.0), (33.0,
            36.0), (22.0, 27.0). The average number of the children for these couples is est15=8/7~1.14.
            This is the estimated number of the children for the 15  couple based on the data from the
                                                               th
            first 14 couples.

            The estimated number of the children for the 16  couple is est16=23/7~3.29. The estimated
                                                         th
                                                                                    th
            number of the children for the 17  couple is also est17=23/7~3.29 since both 16  and 17 th
                                           th
            couple belong to the same cluster.
            Now we will calculate the error E2 (2 for 2 clusters) between the estimated number of the
                                               th
            children (e.g. denoted est15 for the 15  couple) and the actual number of the children
                                            th
            (example. denoted act15 for the 15  couple ) as follows:
            E2=sqrt(sqr(est15-act15)+sqr(est16-act16)+sqr(est17-act17))

            =sqrt(sqr(8/7-1)+sqr(23/7-0)+sqr(23/7-3))~3.3

            Now that we have calculated the error E2, we will calculate the errors of the estimation with
            the other number of clusters. We will choose the number of the clusters with the least error
                                                         th
            to estimate the number of the children for the 18  couple.
            Output for 3 clusters:

                Cluster 0: [(1, (48.0, 49.0)), (2, (40.0, 43.0)), (4, (49.0, 42.0)), (10,
                (42.0, 47.0)), (12, (41.0, 45.0)), (13, (39.0, 43.0))]
                Cluster 1: [(3, (24.0, 28.0)), (6, (24.0, 27.0)), (7, (29.0, 32.0)), (11,
                (22.0, 27.0)), (15, (30.0, 32.0))]
                Cluster 2: [(5, (32.0, 34.0)), (8, (35.0, 35.0)), (9, (33.0, 36.0)), (14,
                (36.0, 38.0)), (16, (36.0, 38.0)), (17, (36.0, 39.0)), (18, (37.0, 38.0))]










                                                    [ 132 ]
   139   140   141   142   143   144   145   146   147   148   149