Page 28 - Data Science Algorithms in a Week
P. 28
Classification Using K Nearest Neighbors
Analysis:
For this problem, we will use the k-NN algorithm - k here means that we will look at k
closest neighbors. Given a white point, it will be classified as a water area if the majority of
its k closest neighbors are in the water area, and classified as land if the majority of its k
closest neighbors are in the land area. We will use the Euclidean metric for the distance:
given two points X=[x0,x ] and Y=[y0,y ], their Euclidean distance is defined as d Euclidean =
1
1
2
sqrt((x0-y0) +(x -y ) ).
2
1
1
The Euclidean distance is the most common metric. Given two points on a piece of paper,
their Euclidean distance is just the length between the two points, as measured by a ruler, as
shown in the diagram:
To apply the k-NN algorithm to an incomplete map, we have to choose the value of k. Since
the resulting class of a point is the class of the majority of the k closest neighbors of that
point, k should be odd. Let us apply the algorithm for the values of k=1,3,5,7,9.
Applying this algorithm to every white point of the incomplete map will result in the
following completed maps:
[ 16 ]