Page 31 - Data Science Algorithms in a Week
P. 31
Classification Using K Nearest Neighbors
The aim is to predict whether Peter, aged 50, with an income of $80k/year, owns a house
and could be a potential customer for our insurance company.
Analysis:
In this case, we could try to apply the 1-NN algorithm. However, we should be careful
about how we are going to measure the distances between the data points, since the income
range is much wider than the age range. Income levels of $115k and $116k are $1,000 apart.
These two data points with these incomes would result in a very long distance. However,
relative to each other, the difference is not too large. Because we consider both measures
(age and yearly income) to be about as important, we would scale both from 0 to 1
according to the formula:
ScaledQuantity = (ActualQuantity-MinQuantity)/(MaxQuantity-MinQuantity)
In our particular case, this reduces to:
ScaledAge = (ActualAge-MinAge)/(MaxAge-MinAge)
ScaledIncome = (ActualIncome- inIncome)/(MaxIncome-inIncome)
[ 19 ]