Page 114 - Data Science Algorithms in a Week
P. 114
5
Clustering into K Clusters
Clustering is a technique to divide the data into clusters so that features in the same cluster
are in a certain sense similar.
In this chapter you will learn:
The k-means clustering algorithm on example about household incomes
An example about gender classification to classify features by clustering them
first with the features with the known classes
To implement k-means clustering algorithm in Python in section Implementation of
k-means clustering algorithm
An example about house ownership and how to choose an appropriate number
of clusters for your analysis
Using the example about house ownership how to scale given data appropriately
to improve the accuracy of the classification by a clustering algorithm
An example about document clustering to understand how a different number of
clusters alters the meaning of the dividing boundary between the clusters
Household incomes - clustering into k
clusters
For example let us take households with the yearly earnings in USD dollars 40k, 55k, 70k,
100k, 115k, 130k, 135k. Then if we require to cluster the households into the two clusters
taking their earnings as a measure of similarity, then the first cluster would have the
households earning 40k, 55k, 70k; the second cluster would have the households earning
100k, 115k, 130k, 135k.