Page 28 - Data Science Algorithms in a Week
P. 28

Classification Using K Nearest Neighbors


            Analysis:

            For this problem, we will use the k-NN algorithm - k here means that we will look at k
            closest neighbors. Given a white point, it will be classified as a water area if the majority of
            its k closest neighbors are in the water area, and classified as land if the majority of its k
            closest neighbors are in the land area. We will use the Euclidean metric for the distance:
            given two points X=[x0,x ] and Y=[y0,y ], their Euclidean distance is defined as d Euclidean  =
                                   1
                                               1
                             2
            sqrt((x0-y0) +(x -y ) ).
                      2
                         1
                           1
            The Euclidean distance is the most common metric. Given two points on a piece of paper,
            their Euclidean distance is just the length between the two points, as measured by a ruler, as
            shown in the diagram:













            To apply the k-NN algorithm to an incomplete map, we have to choose the value of k. Since
            the resulting class of a point is the class of the majority of the k closest neighbors of that
            point, k should be odd. Let us apply the algorithm for the values of k=1,3,5,7,9.
            Applying this algorithm to every white point of the incomplete map will result in the
            following completed maps:
























                                                     [ 16 ]
   23   24   25   26   27   28   29   30   31   32   33