Page 25 - Data Science Algorithms in a Week
P. 25
Classification Using K Nearest Neighbors
Input:
The program above will use the file below as the source of the input data. The file contains
the table with the known data about Mary's temperature preferences:
# source_code/1/mary_and_temperature_preferences/
marry_and_temperature_preferences.data
10 0 cold
25 0 warm
15 5 cold
20 3 warm
18 7 cold
20 10 cold
22 5 warm
24 6 warm
Output:
We run the implementation above on the input file
mary_and_temperature_preferences.data using the k-NN algorithm for k=1
neighbors. The algorithm classifies all the points with the integer coordinates in the
rectangle with a size of (30-5=25) by (10-0=10), so with the a of (25+1) * (10+1) =
286 integer points (adding one to count points on boundaries). Using the wc command, we
find out that the output file contains exactly 286 lines - one data item per point. Using the
head command, we display the first 10 lines from the output file. We visualize all the data
from the output file in the next section:
$ python knn_to_data.py mary_and_temperature_preferences.data
mary_and_temperature_preferences_completed.data 1 5 30 0 10
$ wc -l mary_and_temperature_preferences_completed.data
286 mary_and_temperature_preferences_completed.data
$ head -10 mary_and_temperature_preferences_completed.data
7 3 cold
6 9 cold
12 1 cold
16 6 cold
16 9 cold
14 4 cold
13 4 cold
19 4 warm
18 4 cold
15 1 cold
[ 13 ]