Page 24 - Data Science Algorithms in a Week

P. 24

Classification Using K Nearest Neighbors

# Apply knn algorithm to the 2d data using the k-nearest neighbors with
# the Manhattan distance.
# The dictionary data comes in the form with keys being 2d coordinates
# and the values being the class.
# x,y are integer coordinates for the 2d data with the range
# [x_from,x_to] x [y_from,y_to].
def knn_to_2d_data(data, x_from, x_to, y_from, y_to, k):
new_data = {}
info = {}
# Go through every point in an integer coordinate system.
for y in range(y_from, y_to + 1):
for x in range(x_from, x_to + 1):
info_reset(info)
# Count the number of neighbors for each class group for
# every distance dist starting at 0 until at least k
# neighbors with known classes are found.
for dist in range(0, x_to - x_from + y_to - y_from):
# Count all neighbors that are distanced dist from
# the point [x,y].
if dist == 0:
info_add(info, data, x, y)
else:
for i in range(0, dist + 1):
info_add(info, data, x - i, y + dist - i)
info_add(info, data, x + dist - i, y - i)
for i in range(1, dist):
info_add(info, data, x + i, y + dist - i)
info_add(info, data, x - dist + i, y - i)
# There could be more than k-closest neighbors if the
# distance of more of them is the same from the point
# [x,y]. But immediately when we have at least k of
# them, we break from the loop.
if info['nbhd_count'] >= k:
break
class_max_count = None
# Choose the class with the highest count of the neighbors
# from among the k-closest neighbors.
for group, count in info['class_count'].items():
if group is not None and (class_max_count is None or
count > info['class_count'][class_max_count]):
class_max_count = group
new_data[x, y] = class_max_count
return new_data

[ 12 ]

19 20 21 22 23 24 25 26 27 28 29