Page 24 - Data Science Algorithms in a Week
P. 24

Classification Using K Nearest Neighbors


                # Apply knn algorithm to the 2d data using the k-nearest neighbors with
                # the Manhattan distance.
                # The dictionary data comes in the form with keys being 2d coordinates
                # and the values being the class.
                # x,y are integer coordinates for the 2d data with the range
                # [x_from,x_to] x [y_from,y_to].
                def knn_to_2d_data(data, x_from, x_to, y_from, y_to, k):
                    new_data = {}
                    info = {}
                    # Go through every point in an integer coordinate system.
                    for y in range(y_from, y_to + 1):
                        for x in range(x_from, x_to + 1):
                            info_reset(info)
                            # Count the number of neighbors for each class group for
                            # every distance dist starting at 0 until at least k
                            # neighbors with known classes are found.
                            for dist in range(0, x_to - x_from + y_to - y_from):
                                # Count all neighbors that are distanced dist from
                                # the point [x,y].
                                if dist == 0:
                                    info_add(info, data, x, y)
                                else:
                                    for i in range(0, dist + 1):
                                        info_add(info, data, x - i, y + dist - i)
                                        info_add(info, data, x + dist - i, y - i)
                                    for i in range(1, dist):
                                        info_add(info, data, x + i, y + dist - i)
                                        info_add(info, data, x - dist + i, y - i)
                                # There could be more than k-closest neighbors if the
                                # distance of more of them is the same from the point
                                # [x,y]. But immediately when we have at least k of
                                # them, we break from the loop.
                                if info['nbhd_count'] >= k:
                                    break
                            class_max_count = None
                            # Choose the class with the highest count of the neighbors
                            # from among the k-closest neighbors.
                            for group, count in info['class_count'].items():
                                if group is not None and (class_max_count is None or
                                   count > info['class_count'][class_max_count]):
                                    class_max_count = group
                            new_data[x, y] = class_max_count
                    return new_data








                                                     [ 12 ]
   19   20   21   22   23   24   25   26   27   28   29